2 research outputs found
Former: Calibrated and Complementary Transformer for RGB-Infrared Object Detection
Object detection on visible (RGB) and infrared (IR) images, as an emerging
solution to facilitate robust detection for around-the-clock applications, has
received extensive attention in recent years. With the help of IR images,
object detectors have been more reliable and robust in practical applications
by using RGB-IR combined information. However, existing methods still suffer
from modality miscalibration and fusion imprecision problems. Since transformer
has the powerful capability to model the pairwise correlations between
different features, in this paper, we propose a novel Calibrated and
Complementary Transformer called Former to address these two
problems simultaneously. In Former, we design an Inter-modality
Cross-Attention (ICA) module to obtain the calibrated and complementary
features by learning the cross-attention relationship between the RGB and IR
modality. To reduce the computational cost caused by computing the global
attention in ICA, an Adaptive Feature Sampling (AFS) module is introduced to
decrease the dimension of feature maps. Because Former performs
in the feature domain, it can be embedded into existed RGB-IR object detectors
via the backbone network. Thus, one single-stage and one two-stage object
detector both incorporating our Former are constructed to
evaluate its effectiveness and versatility. With extensive experiments on the
DroneVehicle and KAIST RGB-IR datasets, we verify that our method can fully
utilize the RGB-IR complementary information and achieve robust detection
results. The code is available at
https://github.com/yuanmaoxun/Calibrated-and-Complementary-Transformer-for-RGB-Infrared-Object-Detection.git
Learning to Pan-sharpening with Memories of Spatial Details
Pan-sharpening, as one of the most commonly used techniques in remote sensing
systems, aims to inject spatial details from panchromatic images into
multi-spectral images to obtain high-resolution MS images. Since deep learning
has received widespread attention because of its powerful fitting ability and
efficient feature extraction, a variety of pan-sharpening methods have been
proposed to achieve remarkable performance. However, current pan-sharpening
methods usually require the paired PAN and MS images as the input, which limits
their usage in some scenarios. To address this issue, in this paper, we observe
that the spatial details from PAN images are mainly high-frequency cues, i.e.,
the edges reflect the contour of input PAN images. This motivates us to develop
a PAN-agnostic representation to store some base edges, so as to compose the
contour for the corresponding PAN image via them. As a result, we can perform
the pan-sharpening task with only the MS image when inference. To this end, a
memory-based network is adapted to extract and memorize the spatial details
during the training phase and is used to replace the process of obtaining
spatial information from PAN images when inference, which is called
Memory-based Spatial Details Network (MSDN). We finally integrate the proposed
MSDN module into the existing DL-based pan-sharpening methods to achieve an
end-to-end pan-sharpening network. With extensive experiments on the Gaofen1
and WorldView-4 satellites, we verify that our method constructs good spatial
details without PAN images and achieves the best performance. The code is
available at
https://github.com/Zhao-Tian-yi/Learning-to-Pan-sharpening-with-Memories-of-Spatial-Details.git